Dataset statistics
| Number of variables | 27 |
|---|---|
| Number of observations | 9006 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 1.9 MiB |
| Average record size in memory | 216.0 B |
Variable types
| CAT | 18 |
|---|---|
| NUM | 7 |
| DATE | 1 |
| BOOL | 1 |
crash_time has a high cardinality: 1259 distinct values | High cardinality |
on_street_name has a high cardinality: 2525 distinct values | High cardinality |
off_street_name has a high cardinality: 1694 distinct values | High cardinality |
number_of_cyclist_killed is highly correlated with number_of_persons_killed | High correlation |
number_of_persons_killed is highly correlated with number_of_cyclist_killed | High correlation |
df_index has unique values | Unique |
number_of_persons_injured has 6000 (66.6%) zeros | Zeros |
number_of_pedestrians_injured has 8342 (92.6%) zeros | Zeros |
number_of_motorist_injured has 7182 (79.7%) zeros | Zeros |
Reproduction
| Analysis started | 2020-12-11 10:22:58.586598 |
|---|---|
| Analysis finished | 2020-12-11 10:24:10.568252 |
| Duration | 1 minute and 11.98 seconds |
| Software version | pandas-profiling v2.9.0 |
| Download configuration | config.yaml |
| Distinct | 9006 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5131.27515 |
|---|---|
| Minimum | 2 |
| Maximum | 9999 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 70.4 KiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 763.25 |
| Q1 | 2694.25 |
| median | 5134.5 |
| Q3 | 7571.75 |
| 95-th percentile | 9519.75 |
| Maximum | 9999 |
| Range | 9997 |
| Interquartile range (IQR) | 4877.5 |
Descriptive statistics
| Standard deviation | 2813.937984 |
|---|---|
| Coefficient of variation (CV) | 0.5483896111 |
| Kurtosis | -1.19457331 |
| Mean | 5131.27515 |
| Median Absolute Deviation (MAD) | 2439 |
| Skewness | -8.643233311e-05 |
| Sum | 46212264 |
| Variance | 7918246.977 |
| Monotocity | Strictly increasing |
| Value | Count | Frequency (%) | |
| 2047 | 1 | < 0.1% | |
| 3387 | 1 | < 0.1% | |
| 9542 | 1 | < 0.1% | |
| 3395 | 1 | < 0.1% | |
| 1346 | 1 | < 0.1% | |
| 7489 | 1 | < 0.1% | |
| 5440 | 1 | < 0.1% | |
| 9534 | 1 | < 0.1% | |
| 1338 | 1 | < 0.1% | |
| 7497 | 1 | < 0.1% | |
| Other values (8996) | 8996 | 99.9% |
| Value | Count | Frequency (%) | |
| 2 | 1 | < 0.1% | |
| 5 | 1 | < 0.1% | |
| 9 | 1 | < 0.1% | |
| 10 | 1 | < 0.1% | |
| 12 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 9999 | 1 | < 0.1% | |
| 9998 | 1 | < 0.1% | |
| 9997 | 1 | < 0.1% | |
| 9996 | 1 | < 0.1% | |
| 9995 | 1 | < 0.1% |
crash_date
Date
| Distinct | 116 |
|---|---|
| Distinct (%) | 1.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 70.4 KiB |
| Minimum | 2017-01-17 00:00:00 |
|---|---|
| Maximum | 2020-12-04 00:00:00 |
| Distinct | 1259 |
|---|---|
| Distinct (%) | 14.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 70.4 KiB |
| 00:00:00 | 147 |
|---|---|
| 15:00:00 | 97 |
| 13:00:00 | 96 |
| 17:00:00 | 95 |
| 19:00:00 | 93 |
| Other values (1254) |
| Value | Count | Frequency (%) | |
| 00:00:00 | 147 | 1.6% | |
| 15:00:00 | 97 | 1.1% | |
| 13:00:00 | 96 | 1.1% | |
| 17:00:00 | 95 | 1.1% | |
| 19:00:00 | 93 | 1.0% | |
| 14:00:00 | 93 | 1.0% | |
| 12:00:00 | 92 | 1.0% | |
| 18:00:00 | 91 | 1.0% | |
| 16:00:00 | 89 | 1.0% | |
| 14:30:00 | 85 | 0.9% | |
| Other values (1249) | 8028 | 89.1% |
Unique
| Unique | 259 ? |
|---|---|
| Unique (%) | 2.9% |
Length
| Max length | 8 |
|---|---|
| Median length | 8 |
| Mean length | 8 |
| Min length | 8 |
borough
Categorical
| Distinct | 6 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 70.4 KiB |
| Unknown | |
|---|---|
| Brooklyn | |
| Queens | |
| Bronx | |
| Manhattan |
| Value | Count | Frequency (%) | |
| Unknown | 2974 | 33.0% | |
| Brooklyn | 2079 | 23.1% | |
| Queens | 1623 | 18.0% | |
| Bronx | 1245 | 13.8% | |
| Manhattan | 872 | 9.7% | |
| Staten Island | 213 | 2.4% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 13 |
|---|---|
| Median length | 7 |
| Mean length | 7.109704641 |
| Min length | 5 |
zip_code
Real number (ℝ)
| Distinct | 182 |
|---|---|
| Distinct (%) | 2.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 7300.198867 |
|---|---|
| Minimum | -1 |
| Maximum | 11697 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 70.4 KiB |
Quantile statistics
| Minimum | -1 |
|---|---|
| 5-th percentile | -1 |
| Q1 | -1 |
| median | 10458 |
| Q3 | 11223 |
| 95-th percentile | 11422 |
| Maximum | 11697 |
| Range | 11698 |
| Interquartile range (IQR) | 11224 |
Descriptive statistics
| Standard deviation | 5144.080693 |
|---|---|
| Coefficient of variation (CV) | 0.7046493919 |
| Kurtosis | -1.48043454 |
| Mean | 7300.198867 |
| Median Absolute Deviation (MAD) | 779 |
| Skewness | -0.7013272898 |
| Sum | 65745591 |
| Variance | 26461566.17 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| -1 | 2974 | 33.0% | |
| 11207 | 152 | 1.7% | |
| 11236 | 121 | 1.3% | |
| 11208 | 115 | 1.3% | |
| 11212 | 108 | 1.2% | |
| 10467 | 95 | 1.1% | |
| 11226 | 92 | 1.0% | |
| 11203 | 89 | 1.0% | |
| 11385 | 88 | 1.0% | |
| 10457 | 84 | 0.9% | |
| Other values (172) | 5088 | 56.5% |
| Value | Count | Frequency (%) | |
| -1 | 2974 | 33.0% | |
| 10000 | 4 | < 0.1% | |
| 10001 | 28 | 0.3% | |
| 10002 | 44 | 0.5% | |
| 10003 | 27 | 0.3% |
| Value | Count | Frequency (%) | |
| 11697 | 2 | < 0.1% | |
| 11694 | 4 | < 0.1% | |
| 11693 | 12 | 0.1% | |
| 11692 | 8 | 0.1% | |
| 11691 | 48 | 0.5% |
latitude
Real number (ℝ≥0)
| Distinct | 7248 |
|---|---|
| Distinct (%) | 80.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 40.72793471 |
|---|---|
| Minimum | 40.507267 |
| Maximum | 40.9109 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 70.4 KiB |
Quantile statistics
| Minimum | 40.507267 |
|---|---|
| 5-th percentile | 40.599623 |
| Q1 | 40.6659715 |
| median | 40.7168915 |
| Q3 | 40.8020705 |
| 95-th percentile | 40.8657445 |
| Maximum | 40.9109 |
| Range | 0.403633 |
| Interquartile range (IQR) | 0.136099 |
Descriptive statistics
| Standard deviation | 0.08376902426 |
|---|---|
| Coefficient of variation (CV) | 0.00205679529 |
| Kurtosis | -0.8907520713 |
| Mean | 40.72793471 |
| Median Absolute Deviation (MAD) | 0.0599115 |
| Skewness | 0.1615208588 |
| Sum | 366795.78 |
| Variance | 0.007017249426 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 40.861862 | 13 | 0.1% | |
| 40.84519 | 9 | 0.1% | |
| 40.84518 | 9 | 0.1% | |
| 40.675735 | 8 | 0.1% | |
| 40.76635 | 8 | 0.1% | |
| 40.820305 | 8 | 0.1% | |
| 40.65616 | 7 | 0.1% | |
| 40.651974 | 6 | 0.1% | |
| 40.733536 | 6 | 0.1% | |
| 40.66496 | 6 | 0.1% | |
| Other values (7238) | 8926 | 99.1% |
| Value | Count | Frequency (%) | |
| 40.507267 | 1 | < 0.1% | |
| 40.511734 | 1 | < 0.1% | |
| 40.516624 | 1 | < 0.1% | |
| 40.519127 | 1 | < 0.1% | |
| 40.519722 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 40.9109 | 1 | < 0.1% | |
| 40.91076 | 1 | < 0.1% | |
| 40.91038 | 1 | < 0.1% | |
| 40.91032 | 1 | < 0.1% | |
| 40.909607 | 1 | < 0.1% |
longitude
Real number (ℝ)
| Distinct | 6864 |
|---|---|
| Distinct (%) | 76.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -73.91336268 |
|---|---|
| Minimum | -74.23878 |
| Maximum | -73.70174 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 70.4 KiB |
Quantile statistics
| Minimum | -74.23878 |
|---|---|
| 5-th percentile | -74.02221875 |
| Q1 | -73.9586965 |
| median | -73.91749 |
| Q3 | -73.8680625 |
| 95-th percentile | -73.7626825 |
| Maximum | -73.70174 |
| Range | 0.53704 |
| Interquartile range (IQR) | 0.090634 |
Descriptive statistics
| Standard deviation | 0.08299945637 |
|---|---|
| Coefficient of variation (CV) | -0.001122928972 |
| Kurtosis | 1.211668832 |
| Mean | -73.91336268 |
| Median Absolute Deviation (MAD) | 0.045295 |
| Skewness | -0.3017397742 |
| Sum | -665663.7443 |
| Variance | 0.006888909757 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| -73.91282 | 14 | 0.2% | |
| -73.9112 | 11 | 0.1% | |
| -73.89686 | 9 | 0.1% | |
| -73.91417 | 9 | 0.1% | |
| -73.89083 | 8 | 0.1% | |
| -73.76736 | 7 | 0.1% | |
| -73.919106 | 7 | 0.1% | |
| -73.897736 | 7 | 0.1% | |
| -73.94194 | 7 | 0.1% | |
| -73.86542 | 6 | 0.1% | |
| Other values (6854) | 8921 | 99.1% |
| Value | Count | Frequency (%) | |
| -74.23878 | 1 | < 0.1% | |
| -74.23591 | 1 | < 0.1% | |
| -74.235115 | 1 | < 0.1% | |
| -74.23486 | 1 | < 0.1% | |
| -74.230446 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| -73.70174 | 1 | < 0.1% | |
| -73.70212 | 1 | < 0.1% | |
| -73.70259 | 1 | < 0.1% | |
| -73.70362 | 1 | < 0.1% | |
| -73.70631 | 1 | < 0.1% |
| Distinct | 2525 |
|---|---|
| Distinct (%) | 28.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 70.4 KiB |
| Belt Parkway | 168 |
|---|---|
| Broadway | 99 |
| Brooklyn Queens Expressway | 90 |
| Long Island Expressway | 83 |
| Cross Bronx Expy | 82 |
| Other values (2520) |
| Value | Count | Frequency (%) | |
| Belt Parkway | 168 | 1.9% | |
| Broadway | 99 | 1.1% | |
| Brooklyn Queens Expressway | 90 | 1.0% | |
| Long Island Expressway | 83 | 0.9% | |
| Cross Bronx Expy | 82 | 0.9% | |
| Atlantic Avenue | 82 | 0.9% | |
| Major Deegan Expressway | 79 | 0.9% | |
| Fdr Drive | 77 | 0.9% | |
| Grand Central Pkwy | 77 | 0.9% | |
| 3 Avenue | 70 | 0.8% | |
| Other values (2515) | 8099 | 89.9% |
Unique
| Unique | 1329 ? |
|---|---|
| Unique (%) | 14.8% |
Length
| Max length | 32 |
|---|---|
| Median length | 14 |
| Mean length | 14.46824339 |
| Min length | 6 |
| Distinct | 1694 |
|---|---|
| Distinct (%) | 18.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 70.4 KiB |
| Unknown | |
|---|---|
| 3 Avenue | 39 |
| Broadway | 38 |
| 2 Avenue | 34 |
| 4 Avenue | 24 |
| Other values (1689) |
| Value | Count | Frequency (%) | |
| Unknown | 4869 | 54.1% | |
| 3 Avenue | 39 | 0.4% | |
| Broadway | 38 | 0.4% | |
| 2 Avenue | 34 | 0.4% | |
| 4 Avenue | 24 | 0.3% | |
| 5 Avenue | 24 | 0.3% | |
| Queens Boulevard | 20 | 0.2% | |
| Atlantic Avenue | 20 | 0.2% | |
| Park Avenue | 19 | 0.2% | |
| Linden Boulevard | 17 | 0.2% | |
| Other values (1684) | 3902 | 43.3% |
Unique
| Unique | 884 ? |
|---|---|
| Unique (%) | 9.8% |
Length
| Max length | 29 |
|---|---|
| Median length | 7 |
| Mean length | 9.774483678 |
| Min length | 6 |
| Distinct | 10 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.4384854541 |
|---|---|
| Minimum | 0 |
| Maximum | 10 |
| Zeros | 6000 |
| Zeros (%) | 66.6% |
| Memory size | 70.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 1 |
| 95-th percentile | 2 |
| Maximum | 10 |
| Range | 10 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 0.7623439804 |
|---|---|
| Coefficient of variation (CV) | 1.738584423 |
| Kurtosis | 13.39461939 |
| Mean | 0.4384854541 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 2.781649591 |
| Sum | 3949 |
| Variance | 0.5811683444 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 0 | 6000 | 66.6% | |
| 1 | 2401 | 26.7% | |
| 2 | 400 | 4.4% | |
| 3 | 124 | 1.4% | |
| 4 | 53 | 0.6% | |
| 5 | 15 | 0.2% | |
| 6 | 6 | 0.1% | |
| 7 | 5 | 0.1% | |
| 10 | 1 | < 0.1% | |
| 8 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 0 | 6000 | 66.6% | |
| 1 | 2401 | 26.7% | |
| 2 | 400 | 4.4% | |
| 3 | 124 | 1.4% | |
| 4 | 53 | 0.6% |
| Value | Count | Frequency (%) | |
| 10 | 1 | < 0.1% | |
| 8 | 1 | < 0.1% | |
| 7 | 5 | 0.1% | |
| 6 | 6 | 0.1% | |
| 5 | 15 | 0.2% |
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 70.4 KiB |
| 0 | |
|---|---|
| 3 | 4 |
| Value | Count | Frequency (%) | |
| 0 | 9002 | > 99.9% | |
| 3 | 4 | < 0.1% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
| Distinct | 5 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.07583833 |
|---|---|
| Minimum | 0 |
| Maximum | 4 |
| Zeros | 8342 |
| Zeros (%) | 92.6% |
| Memory size | 70.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 1 |
| Maximum | 4 |
| Range | 4 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.2742315335 |
|---|---|
| Coefficient of variation (CV) | 3.616001744 |
| Kurtosis | 16.15873294 |
| Mean | 0.07583833 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 3.732437796 |
| Sum | 683 |
| Variance | 0.07520293399 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 0 | 8342 | 92.6% | |
| 1 | 648 | 7.2% | |
| 2 | 14 | 0.2% | |
| 4 | 1 | < 0.1% | |
| 3 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 0 | 8342 | 92.6% | |
| 1 | 648 | 7.2% | |
| 2 | 14 | 0.2% | |
| 3 | 1 | < 0.1% | |
| 4 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 4 | 1 | < 0.1% | |
| 3 | 1 | < 0.1% | |
| 2 | 14 | 0.2% | |
| 1 | 648 | 7.2% | |
| 0 | 8342 | 92.6% |
number_of_pedestrians_killed
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 70.4 KiB |
| 0 | |
|---|---|
| 1 | 9 |
| 2 | 1 |
| Value | Count | Frequency (%) | |
| 0 | 8996 | 99.9% | |
| 1 | 9 | 0.1% | |
| 2 | 1 | < 0.1% |
Unique
| Unique | 1 ? |
|---|---|
| Unique (%) | < 0.1% |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
number_of_cyclist_injured
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 70.4 KiB |
| 0 | |
|---|---|
| 1 | 511 |
| 2 | 14 |
| Value | Count | Frequency (%) | |
| 0 | 8481 | 94.2% | |
| 1 | 511 | 5.7% | |
| 2 | 14 | 0.2% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 70.4 KiB |
| 0 | |
|---|---|
| 1 | 4 |
| Value | Count | Frequency (%) | |
| 0 | 9002 | > 99.9% | |
| 1 | 4 | < 0.1% |
| Distinct | 10 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.3027981346 |
|---|---|
| Minimum | 0 |
| Maximum | 10 |
| Zeros | 7182 |
| Zeros (%) | 79.7% |
| Memory size | 70.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 2 |
| Maximum | 10 |
| Range | 10 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.7318615064 |
|---|---|
| Coefficient of variation (CV) | 2.416994766 |
| Kurtosis | 18.49006004 |
| Mean | 0.3027981346 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 3.555934948 |
| Sum | 2727 |
| Variance | 0.5356212645 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 0 | 7182 | 79.7% | |
| 1 | 1254 | 13.9% | |
| 2 | 368 | 4.1% | |
| 3 | 123 | 1.4% | |
| 4 | 51 | 0.6% | |
| 5 | 15 | 0.2% | |
| 6 | 6 | 0.1% | |
| 7 | 5 | 0.1% | |
| 10 | 1 | < 0.1% | |
| 8 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 0 | 7182 | 79.7% | |
| 1 | 1254 | 13.9% | |
| 2 | 368 | 4.1% | |
| 3 | 123 | 1.4% | |
| 4 | 51 | 0.6% |
| Value | Count | Frequency (%) | |
| 10 | 1 | < 0.1% | |
| 8 | 1 | < 0.1% | |
| 7 | 5 | 0.1% | |
| 6 | 6 | 0.1% | |
| 5 | 15 | 0.2% |
number_of_motorist_killed
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 70.4 KiB |
| 0 | |
|---|---|
| 1 | 9 |
| 2 | 2 |
| Value | Count | Frequency (%) | |
| 0 | 8995 | 99.9% | |
| 1 | 9 | 0.1% | |
| 2 | 2 | < 0.1% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
contributing_factor_vehicle_1
Categorical
| Distinct | 50 |
|---|---|
| Distinct (%) | 0.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 70.4 KiB |
| Unspecified | |
|---|---|
| Driver Inattention/Distraction | |
| Failure To Yield Right-Of-Way | |
| Following Too Closely | |
| Passing Too Closely | |
| Other values (45) |
| Value | Count | Frequency (%) | |
| Unspecified | 2504 | 27.8% | |
| Driver Inattention/Distraction | 2267 | 25.2% | |
| Failure To Yield Right-Of-Way | 597 | 6.6% | |
| Following Too Closely | 481 | 5.3% | |
| Passing Too Closely | 324 | 3.6% | |
| Passing Or Lane Usage Improper | 311 | 3.5% | |
| Unsafe Speed | 309 | 3.4% | |
| Backing Unsafely | 269 | 3.0% | |
| Other Vehicular | 249 | 2.8% | |
| Traffic Control Disregarded | 240 | 2.7% | |
| Other values (40) | 1455 | 16.2% |
Unique
| Unique | 8 ? |
|---|---|
| Unique (%) | 0.1% |
Length
| Max length | 53 |
|---|---|
| Median length | 20 |
| Mean length | 20.91605596 |
| Min length | 5 |
contributing_factor_vehicle_2
Categorical
| Distinct | 28 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 70.4 KiB |
| Unspecified | |
|---|---|
| Driver Inattention/Distraction | 361 |
| Other Vehicular | 106 |
| Following Too Closely | 93 |
| Failure To Yield Right-Of-Way | 67 |
| Other values (23) | 347 |
| Value | Count | Frequency (%) | |
| Unspecified | 8032 | 89.2% | |
| Driver Inattention/Distraction | 361 | 4.0% | |
| Other Vehicular | 106 | 1.2% | |
| Following Too Closely | 93 | 1.0% | |
| Failure To Yield Right-Of-Way | 67 | 0.7% | |
| Passing Too Closely | 46 | 0.5% | |
| Passing Or Lane Usage Improper | 42 | 0.5% | |
| Unsafe Speed | 36 | 0.4% | |
| Traffic Control Disregarded | 34 | 0.4% | |
| Unsafe Lane Changing | 24 | 0.3% | |
| Other values (18) | 165 | 1.8% |
Unique
| Unique | 4 ? |
|---|---|
| Unique (%) | < 0.1% |
Length
| Max length | 53 |
|---|---|
| Median length | 11 |
| Mean length | 12.52442816 |
| Min length | 11 |
contributing_factor_vehicle_3
Categorical
| Distinct | 14 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 70.4 KiB |
| Unspecified | |
|---|---|
| Other Vehicular | 21 |
| Driver Inattention/Distraction | 14 |
| Following Too Closely | 14 |
| Obstruction/Debris | 4 |
| Other values (9) | 11 |
| Value | Count | Frequency (%) | |
| Unspecified | 8942 | 99.3% | |
| Other Vehicular | 21 | 0.2% | |
| Driver Inattention/Distraction | 14 | 0.2% | |
| Following Too Closely | 14 | 0.2% | |
| Obstruction/Debris | 4 | < 0.1% | |
| Driver Inexperience | 2 | < 0.1% | |
| Unsafe Speed | 2 | < 0.1% | |
| Aggressive Driving/Road Rage | 1 | < 0.1% | |
| Pavement Slippery | 1 | < 0.1% | |
| Passing Or Lane Usage Improper | 1 | < 0.1% | |
| Other values (4) | 4 | < 0.1% |
Unique
| Unique | 7 ? |
|---|---|
| Unique (%) | 0.1% |
Length
| Max length | 30 |
|---|---|
| Median length | 11 |
| Mean length | 11.0700644 |
| Min length | 11 |
contributing_factor_vehicle_4
Categorical
| Distinct | 7 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 70.4 KiB |
| Unspecified | |
|---|---|
| Other Vehicular | 9 |
| Driver Inattention/Distraction | 4 |
| Following Too Closely | 4 |
| Driver Inexperience | 1 |
| Other values (2) | 2 |
| Value | Count | Frequency (%) | |
| Unspecified | 8986 | 99.8% | |
| Other Vehicular | 9 | 0.1% | |
| Driver Inattention/Distraction | 4 | < 0.1% | |
| Following Too Closely | 4 | < 0.1% | |
| Driver Inexperience | 1 | < 0.1% | |
| Passing Or Lane Usage Improper | 1 | < 0.1% | |
| Obstruction/Debris | 1 | < 0.1% |
Unique
| Unique | 3 ? |
|---|---|
| Unique (%) | < 0.1% |
Length
| Max length | 30 |
|---|---|
| Median length | 11 |
| Mean length | 11.0206529 |
| Min length | 11 |
contributing_factor_vehicle_5
Categorical
| Distinct | 5 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 70.4 KiB |
| Unspecified | |
|---|---|
| Following Too Closely | 4 |
| Driver Inattention/Distraction | 1 |
| Other Vehicular | 1 |
| Obstruction/Debris | 1 |
| Value | Count | Frequency (%) | |
| Unspecified | 8999 | 99.9% | |
| Following Too Closely | 4 | < 0.1% | |
| Driver Inattention/Distraction | 1 | < 0.1% | |
| Other Vehicular | 1 | < 0.1% | |
| Obstruction/Debris | 1 | < 0.1% |
Unique
| Unique | 3 ? |
|---|---|
| Unique (%) | < 0.1% |
Length
| Max length | 30 |
|---|---|
| Median length | 11 |
| Mean length | 11.0077726 |
| Min length | 11 |
vehicle_type_code_1
Categorical
| Distinct | 14 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 70.4 KiB |
| Sedan | |
|---|---|
| Station Wagon/Sport Utility Vehicle | |
| Other | 313 |
| Taxi | 262 |
| Pick-Up Truck | 204 |
| Other values (9) |
| Value | Count | Frequency (%) | |
| Sedan | 4380 | 48.6% | |
| Station Wagon/Sport Utility Vehicle | 3032 | 33.7% | |
| Other | 313 | 3.5% | |
| Taxi | 262 | 2.9% | |
| Pick-Up Truck | 204 | 2.3% | |
| Box Truck | 157 | 1.7% | |
| Bike | 131 | 1.5% | |
| Bus | 121 | 1.3% | |
| Unspecified | 112 | 1.2% | |
| Tractor Truck Diesel | 81 | 0.9% | |
| Other values (4) | 213 | 2.4% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 35 |
|---|---|
| Median length | 5 |
| Mean length | 15.53131246 |
| Min length | 3 |
vehicle_type_code_2
Categorical
| Distinct | 14 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 70.4 KiB |
| Unspecified | |
|---|---|
| Sedan | |
| Station Wagon/Sport Utility Vehicle | |
| Bike | |
| Other | 268 |
| Other values (9) |
| Value | Count | Frequency (%) | |
| Unspecified | 3149 | 35.0% | |
| Sedan | 2620 | 29.1% | |
| Station Wagon/Sport Utility Vehicle | 1787 | 19.8% | |
| Bike | 334 | 3.7% | |
| Other | 268 | 3.0% | |
| Box Truck | 167 | 1.9% | |
| Pick-Up Truck | 121 | 1.3% | |
| Taxi | 108 | 1.2% | |
| Bus | 100 | 1.1% | |
| E-Scooter | 91 | 1.0% | |
| Other values (4) | 261 | 2.9% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 35 |
|---|---|
| Median length | 11 |
| Mean length | 13.34710193 |
| Min length | 3 |
vehicle_type_code_3
Categorical
| Distinct | 12 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 70.4 KiB |
| Unspecified | |
|---|---|
| Sedan | 428 |
| Station Wagon/Sport Utility Vehicle | 366 |
| Other | 14 |
| Pick-Up Truck | 13 |
| Other values (7) | 33 |
| Value | Count | Frequency (%) | |
| Unspecified | 8152 | 90.5% | |
| Sedan | 428 | 4.8% | |
| Station Wagon/Sport Utility Vehicle | 366 | 4.1% | |
| Other | 14 | 0.2% | |
| Pick-Up Truck | 13 | 0.1% | |
| Taxi | 10 | 0.1% | |
| Box Truck | 6 | 0.1% | |
| Bus | 5 | 0.1% | |
| Tractor Truck Diesel | 4 | < 0.1% | |
| Bike | 3 | < 0.1% | |
| Other values (2) | 5 | 0.1% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 35 |
|---|---|
| Median length | 11 |
| Mean length | 11.66899845 |
| Min length | 3 |
vehicle_type_code_4
Categorical
| Distinct | 11 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 70.4 KiB |
| Unspecified | |
|---|---|
| Station Wagon/Sport Utility Vehicle | 111 |
| Sedan | 110 |
| Pick-Up Truck | 6 |
| Bike | 3 |
| Other values (6) | 8 |
| Value | Count | Frequency (%) | |
| Unspecified | 8768 | 97.4% | |
| Station Wagon/Sport Utility Vehicle | 111 | 1.2% | |
| Sedan | 110 | 1.2% | |
| Pick-Up Truck | 6 | 0.1% | |
| Bike | 3 | < 0.1% | |
| Other | 2 | < 0.1% | |
| Taxi | 2 | < 0.1% | |
| Box Truck | 1 | < 0.1% | |
| Motorcycle | 1 | < 0.1% | |
| Tractor Truck Diesel | 1 | < 0.1% |
Unique
| Unique | 4 ? |
|---|---|
| Unique (%) | < 0.1% |
Length
| Max length | 35 |
|---|---|
| Median length | 11 |
| Mean length | 11.21840995 |
| Min length | 3 |
vehicle_type_code_5
Categorical
| Distinct | 8 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 70.4 KiB |
| Unspecified | |
|---|---|
| Sedan | 42 |
| Station Wagon/Sport Utility Vehicle | 37 |
| Pick-Up Truck | 2 |
| Taxi | 2 |
| Other values (3) | 3 |
| Value | Count | Frequency (%) | |
| Unspecified | 8920 | 99.0% | |
| Sedan | 42 | 0.5% | |
| Station Wagon/Sport Utility Vehicle | 37 | 0.4% | |
| Pick-Up Truck | 2 | < 0.1% | |
| Taxi | 2 | < 0.1% | |
| Other | 1 | < 0.1% | |
| Bike | 1 | < 0.1% | |
| Motorcycle | 1 | < 0.1% |
Unique
| Unique | 3 ? |
|---|---|
| Unique (%) | < 0.1% |
Length
| Max length | 35 |
|---|---|
| Median length | 11 |
| Mean length | 11.0679547 |
| Min length | 4 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.First rows
| df_index | crash_date | crash_time | borough | zip_code | latitude | longitude | on_street_name | off_street_name | number_of_persons_injured | number_of_persons_killed | number_of_pedestrians_injured | number_of_pedestrians_killed | number_of_cyclist_injured | number_of_cyclist_killed | number_of_motorist_injured | number_of_motorist_killed | contributing_factor_vehicle_1 | contributing_factor_vehicle_2 | contributing_factor_vehicle_3 | contributing_factor_vehicle_4 | contributing_factor_vehicle_5 | vehicle_type_code_1 | vehicle_type_code_2 | vehicle_type_code_3 | vehicle_type_code_4 | vehicle_type_code_5 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | 2020-12-03 | 13:37:00 | Unknown | -1 | 40.798504 | -73.967125 | West 103 Street | Unknown | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | Unspecified | Unspecified | Unspecified | Unspecified | Unspecified | Unspecified | Unspecified | Unspecified | Unspecified | Unspecified |
| 1 | 5 | 2020-12-02 | 19:00:00 | Unknown | -1 | 40.731167 | -73.709940 | 256 Street | 87 Avenue | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Unspecified | Unspecified | Unspecified | Unspecified | Unspecified | Taxi | Unspecified | Unspecified | Unspecified | Unspecified |
| 2 | 9 | 2020-11-30 | 09:40:00 | Queens | 11375 | 40.735550 | -73.850970 | 62-60 108 Street | Unknown | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Unspecified | Unspecified | Unspecified | Unspecified | Unspecified | Unspecified | Unspecified | Unspecified | Unspecified | Unspecified |
| 3 | 10 | 2020-11-29 | 05:45:00 | Unknown | -1 | 40.701527 | -73.989570 | Brooklyn Queens Expressway | Unknown | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Fell Asleep | Unspecified | Unspecified | Unspecified | Unspecified | Sedan | Unspecified | Unspecified | Unspecified | Unspecified |
| 4 | 12 | 2020-11-26 | 23:30:00 | Unknown | -1 | 40.700108 | -73.953830 | Wallabout Street | Unknown | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | Driver Inattention/Distraction | Unspecified | Unspecified | Unspecified | Unspecified | Sedan | Sedan | Unspecified | Unspecified | Unspecified |
| 5 | 18 | 2020-11-23 | 11:28:00 | Brooklyn | 11215 | 40.668293 | -73.979240 | 6 Street | Unknown | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | Driver Inattention/Distraction | Unspecified | Unspecified | Unspecified | Unspecified | Sedan | Unspecified | Unspecified | Unspecified | Unspecified |
| 6 | 21 | 2020-11-22 | 20:10:00 | Unknown | -1 | 40.624640 | -74.141670 | Forest Avenue | Decker Avenue | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | Unspecified | Unspecified | Unspecified | Unspecified | Unspecified | Unspecified | Unspecified | Unspecified | Unspecified | Unspecified |
| 7 | 27 | 2020-11-20 | 12:00:00 | Unknown | -1 | 40.677483 | -73.930330 | Utica Avenue | Unknown | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | Other Vehicular | Driver Inattention/Distraction | Unspecified | Unspecified | Unspecified | Taxi | Bike | Unspecified | Unspecified | Unspecified |
| 8 | 33 | 2020-11-18 | 11:00:00 | Manhattan | 10010 | 40.736706 | -73.978220 | East 23 Street | Unknown | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | Turning Improperly | Unspecified | Unspecified | Unspecified | Unspecified | Unspecified | Unspecified | Unspecified | Unspecified | Unspecified |
| 9 | 35 | 2017-01-17 | 03:02:00 | Unknown | -1 | 40.608757 | -74.038086 | Verrazano Bridge Lower | Unknown | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Unspecified | Unspecified | Unspecified | Unspecified | Unspecified | Sedan | Unspecified | Unspecified | Unspecified | Unspecified |
Last rows
| df_index | crash_date | crash_time | borough | zip_code | latitude | longitude | on_street_name | off_street_name | number_of_persons_injured | number_of_persons_killed | number_of_pedestrians_injured | number_of_pedestrians_killed | number_of_cyclist_injured | number_of_cyclist_killed | number_of_motorist_injured | number_of_motorist_killed | contributing_factor_vehicle_1 | contributing_factor_vehicle_2 | contributing_factor_vehicle_3 | contributing_factor_vehicle_4 | contributing_factor_vehicle_5 | vehicle_type_code_1 | vehicle_type_code_2 | vehicle_type_code_3 | vehicle_type_code_4 | vehicle_type_code_5 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 8996 | 9990 | 2020-11-02 | 12:20:00 | Queens | 11434 | 40.656160 | -73.76736 | Rockaway Boulevard | Brewer Boulevard | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Unspecified | Unspecified | Unspecified | Unspecified | Unspecified | Station Wagon/Sport Utility Vehicle | Unspecified | Unspecified | Unspecified | Unspecified |
| 8997 | 9991 | 2020-11-04 | 18:00:00 | Bronx | 10468 | 40.860850 | -73.90545 | Aqueduct Avenue | Unknown | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Unspecified | Unspecified | Unspecified | Unspecified | Unspecified | Station Wagon/Sport Utility Vehicle | Unspecified | Unspecified | Unspecified | Unspecified |
| 8998 | 9992 | 2020-11-12 | 18:20:00 | Queens | 11433 | 40.704388 | -73.77917 | 180 Street | 105 Avenue | 2 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | Driver Inattention/Distraction | Unspecified | Unspecified | Unspecified | Unspecified | Sedan | Station Wagon/Sport Utility Vehicle | Unspecified | Unspecified | Unspecified |
| 8999 | 9993 | 2020-11-18 | 07:30:00 | Manhattan | 10031 | 40.829020 | -73.94485 | Amsterdam Avenue | West 151 Street | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Driver Inattention/Distraction | Unspecified | Unspecified | Unspecified | Unspecified | Sedan | Unspecified | Unspecified | Unspecified | Unspecified |
| 9000 | 9994 | 2020-11-13 | 21:43:00 | Brooklyn | 11217 | 40.686500 | -73.98787 | Hoyt Street | Dean Street | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Unspecified | Unspecified | Unspecified | Unspecified | Unspecified | Sedan | Unspecified | Unspecified | Unspecified | Unspecified |
| 9001 | 9995 | 2020-11-04 | 12:10:00 | Unknown | -1 | 40.658535 | -73.97328 | West Drive | Center Drive | 2 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | Unspecified | Unspecified | Unspecified | Unspecified | Unspecified | Bike | Unspecified | Unspecified | Unspecified | Unspecified |
| 9002 | 9996 | 2020-11-18 | 09:15:00 | Queens | 11691 | 40.598297 | -73.74828 | 14-05 New Haven Avenue | Unknown | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Driver Inattention/Distraction | Unspecified | Unspecified | Unspecified | Unspecified | Station Wagon/Sport Utility Vehicle | Unspecified | Unspecified | Unspecified | Unspecified |
| 9003 | 9997 | 2020-11-04 | 11:24:00 | Brooklyn | 11218 | 40.640415 | -73.98593 | 39 Street | Unknown | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | Unspecified | Unspecified | Unspecified | Unspecified | Unspecified | Station Wagon/Sport Utility Vehicle | Box Truck | Unspecified | Unspecified | Unspecified |
| 9004 | 9998 | 2020-11-13 | 12:05:00 | Queens | 11427 | 40.735300 | -73.73681 | 82-25 234 Street | Unknown | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Tire Failure/Inadequate | Unspecified | Unspecified | Unspecified | Unspecified | Station Wagon/Sport Utility Vehicle | Sedan | Station Wagon/Sport Utility Vehicle | Unspecified | Unspecified |
| 9005 | 9999 | 2020-11-03 | 08:00:00 | Queens | 11368 | 40.750225 | -73.85515 | 111 Street | 42 Avenue | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Unspecified | Unspecified | Unspecified | Unspecified | Unspecified | Sedan | Sedan | Unspecified | Unspecified | Unspecified |